- Disclaimer: This tool is designed for research and is not intended
for diagnostic purpose.
- Version of
TRexs: 0.3
- Number of disease samples: 6
- Number of control samples: 118
- Number of repeats genotyped: 56
Distribution of repeat expansion relative to HPRC
controls
- Mouse over each point for detailed sample information,
- Mouse over color block for disease-related information.
- Left click to draw rectangle to zoom into any region on the
plot.
- Note that number of copies here are sum of all expanded motifs and
are not specific to motifs known to cause disease.
- By default, normal, premutation and pathogenic thresholds are
curated using the
DRED database as a base. Developer may
manually curate and modify the threshold to the best of his/her
knowledge.
- Transparency of the points for diseased cohort corresponds to the
isofor_score which is the method used to detect outlier
based on allelic length (regardless of motifs). The higher the score is,
the rarer the sample’s repeats expansion is.
Table of potentially significant repeats
- Curated repeat expansions with high prevalence not shown here:
- Top row of table can be used to filter the samples. For example type
“Yes” in the column “Pathogenic motif high?” to look for samples with
expanded motifs known to cause disease (Note that this will remove
samples with novel motifs).
- You can also use expression like “>10” in numerical columns to
filter based on cut-offs.
- This table is generated based on the following logics:
MC tags from trgt output is used to get
total number of copies on each allele (e.g. “50_50” represent 50 copies
for each motif genotyped, and this will sum to 100).
- All samples are filtered if maximum number of copies expanded by any
motif exceed known pathogenic threshold. This means, if there’s 1000
copies of
TAAAA, we will output this even though it’s not
the known pathogenic motifs.
- For the remaining samples, we further filter based on known
inheritance patterns. E.g. in
RFC1 both allele needs to be
expanded more than the pathogenic threshold.
- However, if known pathogenic motifs are found in sample, it’s tested
specifically for that motif to produce “Pathogenic motif high?”. E.g. in
BEAN1 we test specifically if TGGAA has more
copies than known threshold.
- Finally, if we see >10% or n>=5 control samples have copies
more than pathogenic threshold, we check if the disease samples are
expanded more than the minimum copies in the group of control samples
with high expansion, while considering inheritance pattern similar to
above.
- If there’s more than 10 samples being looked at, we filter out any
genes whereby more than half of the samples are expanded in the repeats
as these are likely not disease-causing.
- Note that the genotyped motif for some repeats may differ from
conventional testing. E.g.
ATXN2 is genotyped with
GCT instead of the usual CAG, so there may be
an offset of 1 copy.
- Outlier score is determined using isolation forest (package
isotree with a default of n=100 trees).
TRVZ Visualization of Potentially
Pathogenic Repeats
- By default, only those with matching pathogenic motifs (“Pathogenic
Motif High” column above) will be visualized here.
Parameters
show_high_prev_gene : FALSE
version : 0.3
sample_sheet :
/home/kpin/pb_bitbucket/TRexs/control_vcf/sample_sheet_HG001-4.tsv
control_tsv :
/home/kpin/pb_bitbucket/TRexs/resources/control_samples_repeat_2022-10-20.tsv.gz
trvz_binary :
/home/kpin/softwares/trgt/trvz/target/release/trvz
pathogenic_bed :
/home/kpin/pb_bitbucket/TRexs/resources/pathogenic_repeats.hg38.bed
hg38 :
/nrt-data/downstream/variants_calling/2022-2-6_KKH_neuro_10samples/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta
repeats_db :
/home/kpin/pb_bitbucket/TRexs/resources/repeats_information.tsv
high_prev_genes :
odir :
/home/kpin/pb_bitbucket/TRexs/control_vcf/trgt_report_v0.1.1_2022-10-20